The Characteristics, Uses, and Biases of Studies Related to Malignancies Using Google Trends: Systematic Review

Background The internet is a primary source of health information for patients, supplementing physician care. Google Trends (GT), a popular tool, allows the exploration of public interest in health-related phenomena. Despite the growing volume of GT studies, none have focused explicitly on oncology, creating a need for a systematic review to bridge this gap. Objective We aimed to systematically characterize studies related to oncology using GT to describe its utilities and biases. Methods We included all studies that used GT to analyze Google searches related to malignancies. We excluded studies written in languages other than English. The search was performed using the PubMed engine on August 1, 2022. We used the following search input: “Google trends” AND (“oncology” OR “cancer” or “malignancy” OR “tumor” OR “lymphoma” OR “multiple myeloma” OR “leukemia”). We analyzed sources of bias that included using search terms instead of topics, lack of confrontation of GT statistics with real-world data, and absence of sensitivity analysis. We performed descriptive statistics. Results A total of 85 articles were included. The first study using GT for oncology research was published in 2013, and since then, the number of publications has increased annually. The studies were categorized as follows: 22% (19/85) were related to prophylaxis, 20% (17/85) pertained to awareness events, 11% (9/85) were celebrity-related, 13% (11/85) were related to COVID-19, and 47% (40/85) fell into other categories. The most frequently analyzed cancers were breast (n=28), prostate (n=26), lung (n=18), and colorectal cancers (n=18). We discovered that of the 85 studies, 17 (20%) acknowledged using GT topics instead of search terms, 79 (93%) disclosed all search input details necessary for replicating their results, and 34 (40%) compared GT statistics with real-world data. The most prevalent methods for analyzing the GT data were correlation analysis (55/85, 65%) and peak analysis (43/85, 51%). The authors of only 11% (9/85) of the studies performed a sensitivity analysis. Conclusions The number of studies related to oncology using GT data has increased annually. The studies included in this systematic review demonstrate a variety of concerning topics, search strategies, and statistical methodologies. The most frequently analyzed cancers were breast, prostate, lung, colorectal, skin, and cervical cancers, potentially reflecting their prevalence in the population or public interest. Although most researchers provided reproducible search inputs, only one-fifth used GT topics instead of search terms, and many studies lacked a sensitivity analysis. Scientists using GT for medical research should ensure the quality of studies by providing a transparent search strategy to reproduce results, preferring to use topics over search terms, and performing robust statistical calculations coupled with sensitivity analysis.


Rationale
3 Describe the rationale for the review in the context of existing knowledge. To date, a few reviews characterized medical research utilizing GT. None of them has explicitly contributed to oncology. The systematic review may reveal the utility of GT in cancer-related studies, its limitations, and good practices for future studies.
The PubMed search input was the following: "Google trends" AND ("oncology" OR "cancer" OR "malignancy" OR "tumor" OR "lymphoma" OR "multiple myeloma" OR "leukemia").
Selection process 8 Specify the methods used to decide whether a study met the inclusion criteria of the review, including how many reviewers screened each record and each report retrieved, whether they worked independently, and if applicable, details of automation tools used in the process.
Two authors (J.C. and M.K.) independently screened titles and abstracts of potential articles. All discrepancies were refereed by PS. The PRISMA flow diagram is presented in Figure 1. We obtained n = 120 pieces for a title and abstract screening. Further, we excluded n = 33 records, n = 2 were additionally excluded because of inaccessible full-version text, and n = 85 articles were thoroughly read. Finally, we included n = 85 for the final review.
Data collection process 9 Specify the methods used to collect data from reports, including how many reviewers collected data from each report, whether they worked independently, any processes for obtaining or confirming data from study investigators, and if applicable, details of automation tools used in the process.
The included articles were carefully read and analyzed by one of the authors (J.C.), who retrieved all essential details. We collected the following details: title, doi code, authors' countries, year of publication, aim(s), aims classification, study category (prophylaxis, awareness month, celebrity death, COVID-19, others), malignancies considered, analysis of topics or search terms, regions and period analyzed, statistical reproducibility method used, confrontation with real-world data, principal findings, sensitivity analysis.
Data items 10a List and define all outcomes for which data were sought. Specify whether all results that were compatible with each outcome domain in each study were sought (e.g. for all measures, time points, analyses), and if not, the methods used to decide which results to collect.
We classified aims (causal interference, description, surveillance, other) based on the classification used in the review of Nuti et al. After the initial screening of the included articles, we divided all studies into five specific categories: related to the prophylaxis method, related to an awareness event (e.g. Pink October), related to a celebrity (e.g, death or announcement of suffering from cancer), related to the COVID-19 pandemic and others. Reproducibility of the study was defined as the provision of the authors' search input details (search term/topics used, region analyzed, period analyzed, information about including countries with low or nolow search volume if the region was set to "Worldwide") necessary to reproduce the outcomes.
10b List and define all other variables for which data were sought (e.g. participant and intervention characteristics, funding sources). Describe any assumptions made about any missing or unclear information.
The complete characteristics of the studies are available in Supplementary File 2.

Section and Topic Item # Checklist item Location where item is reported
Study risk of bias assessment 11 Specify the methods used to assess risk of bias in the included studies, including details of the tool(s) used, how many reviewers assessed each study and whether they worked independently, and if applicable, details of automation tools used in the process.
Two authors (J.C. and M.K.) independently screened titles and abstracts of potential articles. All discrepancies were refereed by PS. (…) Reproducibility of the study was defined as the provision of the authors' search input details (search term/topics used, region analyzed, period analyzed, information about including countries with low or no-low search volume if the region was set to "Worldwide") necessary to reproduce the outcomes.
Effect measures 12 Specify for each outcome the effect measure(s) (e.g. risk ratio, mean difference) used in the synthesis or presentation of results.
We performed descriptive statistics. All categorical variables were reported as numbers (percentage), while all numerical as median (interquartile range).
Synthesis methods 13a Describe the processes used to decide which studies were eligible for each synthesis (e.g. tabulating the study intervention characteristics and comparing against the planned groups for each synthesis (item #5)).
The PubMed search input was the following: "Google trends" AND ("oncology" OR "cancer" OR "malignancy" OR "tumor" OR "lymphoma" OR "multiple myeloma" OR "leukemia"). Two authors (J.C. and M.K.) independently screened titles and abstracts of potential articles. All discrepancies were refereed by PS. (…) We obtained n = 120 pieces for a title and abstract screening. Further, we excluded n = 33 records, n = 2 were additionally excluded because of inaccessible fullversion text, and n = 85 articles were thoroughly read. Finally, we included n = 85 for the final review.
13b Describe any methods required to prepare the data for presentation or synthesis, such as handling of missing summary statistics, or data conversions.
We performed descriptive statistics. All categorical variables were reported as numbers (percentage), while all numerical as median (interquartile range). Data manipulation, calculations and visualizations were performed using R-programming language 3.6.1 (R Foundation, Vienna, Austria) 13c Describe any methods used to tabulate or visually display results of individual studies and syntheses.
Data manipulation, calculations and visualizations were performed using Rprogramming language 3.6.1 (R Foundation, Vienna, Austria) 13d Describe any methods used to synthesize results and provide a rationale for the choice(s). If meta-analysis was performed, describe the model(s), method(s) to identify the presence and extent of statistical heterogeneity, and software package(s) used.
We performed descriptive statistics. All categorical variables were reported as numbers (percentage), while all numerical as median (interquartile range).
13e Describe any methods used to explore possible causes of heterogeneity among study results (e.g. subgroup analysis, meta-regression).
Studies utilizing GT are retrospective and involve aggregated data of Google users without details, e.g. sex and age. Nuti et al. address two sources of potential bias in GT studies: search strategy and validation of the studies. Here, we analyzed the following sources of bias: 1) using search terms instead of topics, 2) lack of confrontation with real-world data, and 3) absence of a sensitivity analysis. We performed descriptive statistics. All categorical variables were reported as numbers (percentage), while all numerical as median (interquartile range). Data manipulation, calculations and visualizations were performed using R-programming language 3.6.1 (R Foundation, Vienna, Austria) 13f Describe any sensitivity analyses conducted to assess robustness of the synthesized results.
Here, we analyzed the following sources of bias: 1) using search terms instead of topics, 2) lack of confrontation with real-world data, and 3) absence of a sensitivity analysis.

Reporting bias assessment
14 Describe any methods used to assess risk of bias due to missing results in a synthesis (arising from reporting biases).
Here, we analyzed the following sources of bias: 1) using search terms instead of topics, 2) lack of confrontation with real-world data, and 3) absence of a sensitivity Section and Topic Item # Checklist item Location where item is reported analysis.
Certainty assessment 15 Describe any methods used to assess certainty (or confidence) in the body of evidence for an outcome.
Studies utilizing GT are retrospective and involve aggregated data of Google users without details, e.g. sex and age. Nuti et al. address two sources of potential bias in GT studies: search strategy and validation of the studies. Here, we analyzed the following sources of bias: 1) using search terms instead of topics, 2) lack of confrontation with real-world data, and 3) absence of a sensitivity analysis. We performed descriptive statistics. All categorical variables were reported as numbers (percentage), while all numerical as median (interquartile range). Data manipulation, calculations and visualizations were performed using R-programming language 3.6.1 (R Foundation, Vienna, Austria)

Study selection 16a
Describe the results of the search and selection process, from the number of records identified in the search to the number of studies included in the review, ideally using a flow diagram.
The PRISMA flow diagram is presented in Figure 1. We obtained n = 120 pieces for a title and abstract screening. Further, we excluded n = 33 records, n = 2 were additionally excluded because of inaccessible full-version text, and n = 85 articles were thoroughly read. Finally, we included n = 85 for the final review. (…)We included 85 articles for our final analysis.
16b Cite studies that might appear to meet the inclusion criteria, but which were excluded, and explain why they were excluded.
We obtained n = 120 pieces for a title and abstract screening. Further, we excluded n = 33 records, n = 2 were additionally excluded because of inaccessible fullversion text, and n = 85 articles were thoroughly read. Finally, we included n = 85 for the final review. (…)We included 85 articles for our final analysis. Ten examples of included research are presented in Table 1.

Study characteristics
17 Cite each included study and present its characteristics. The complete characteristics of the studies are available in Supplementary File 2.
(…)The first cancer-related study utilizing GT was published in 2013, and in the following years, the yearly number of publications rapidly increased and reached n = 20 in both 2020 and 2021 ( Figure 2). A total of n = 23 (27.1%) researches were published in a journal dedicated to oncology, n = 9 (10.6%) in the journal about public health issues, n = 8 (9.4%) in dermatological journals, n = 18 (21.2%) in journal with broad scope (e.g., PloS One, BMJ Open, Cureus), and n = 28 (32.9%) in others types of journals. N = 32 (37.6%) articles were published in open-access journals. Over half of the studies had descriptive characters, mainly characterizing temporal and regional trends ( Figure 3). E.g., Zhang et al. described the seasonal interest of Google users in "tobacco" and "lung cancer" in Australia, New Zealand, the United Kingdom, and the United State. Approximately 30% of studies had surveillance characteristics. An example of a surveillance study is Brazilian research assessing Google searches related to breast cancer and mammogram and their association with the number of cases and performed mammographies in Brazilian states. Finally, one of six papers assessed causal interference between specific events and Google searches. For instance, Noar et al. analyzed the effects of public figure announcements of diagnosis and/or death due to pancreatic cancer on Google queries (Table 1). We classified n = 19 (22.4%) studies as related to prophylaxis, n = 17 (20.0%) as Section and Topic Item # Checklist item Location where item is reported awareness event, n = 9 (10.6%) as celebrity-related, n = 11 (12.9%) related to COVID-19 and n = 40 (47.1%) as others ( Figure 4). An example of studies on prophylaxis is the study of Kamiński et al., who found an association between the burden of colorectal cancer and interest in colonoscopy of Google users in n = 60 countries. Several studies analyzed the association between GT statistics and awareness month, e.g., Pink October represents breast cancer awareness month, and March is an awareness month of colorectal cancer in the United States.
Announcements of the death of celebrities due to cancer may increase the number of Google queries on cancer, which was reported after the death of Chadwick Boseman due to colorectal cancer (Table 1). In recent years, many researchers analyzed the effects of the COVID-19 pandemic on the interest of Google users in many malignancies. E.g., Adelhoefer et al. found that interest in many malignancies decreased during the pandemic's first months.

Risk of bias in studies
18 Present assessments of risk of bias for each included study. The first cancer-related study utilizing GT was published in 2013, and in the following years, the yearly number of publications rapidly increased and reached n = 20 in both 2020 and 2021 ( Figure 2). A total of n = 23 (27.1%) researches were published in a journal dedicated to oncology, n = 9 (10.6%) in the journal about public health issues, n = 8 (9.4%) in dermatological journals, n = 18 (21.2%) in journal with broad scope (e.g., PloS One, BMJ Open, Cureus), and n = 28 (32.9%) in others types of journals. N = 32 (37.6%) articles were published in open-access journals. Over half of the studies had descriptive characters, mainly characterizing temporal and regional trends ( Figure 3). E.g., Zhang et al. described the seasonal interest of Google users in "tobacco" and "lung cancer" in Australia, New Zealand, the United Kingdom, and the United States. Approximately 30% of studies had surveillance characteristics. An example of a surveillance study is Brazilian research assessing Google searches related to breast cancer and mammogram and their association with the number of cases and performed mammographies in Brazilian states. Finally, one of six papers assessed causal interference between specific events and Google searches. For instance, Noar et al. analyzed the effects of public figure announcements of diagnosis and/or death due to pancreatic cancer on Google queries (Table 1). We classified n = 19 (22.4%) studies as related to prophylaxis, n = 17 (20.0%) as awareness event, n = 9 (10.6%) as celebrity-related, n = 11 (12.9%) related to COVID-19 and n = 40 (47.1%) as others (Figure 4). An example of studies on prophylaxis is the study of Kamiński et al., who found an association between the burden of colorectal cancer and interest in colonoscopy of Google users in n = 60 countries. Several studies analyzed the association between GT statistics and awareness month, e.g., Pink October represents breast cancer awareness month, and March is an awareness month of colorectal cancer in the United States. Announcements of the death of celebrities due to cancer may increase the number of Google queries on cancer, which was reported after the death of Chadwick Boseman due to colorectal cancer (Table 1). In recent years, many researchers analyzed the effects of the COVID-19 pandemic on the interest of Google users in many malignancies. E.g., Adelhoefer et al. found that interest in many malignancies decreased during the pandemic's first months. The most frequently analyzed cancers were: breast cancer (n = 28), prostate cancer (n = 26), lung cancer (n = 18), colorectal cancers (n = 18), skin cancers (n = 16), and cervical cancer (n = 14) ( Figure 5). We found that n = 17 (20.0%) of studies admitted using GT topics instead of search terms. Overall, n = 24 (28.2%) researches involved GT statistics for all available countries, n = 33 (38.8%) only for the United States, and n = 30 (35.3%) for other countries or combinations of different countries (including combination with the United States). A total of n = 40 (47.1%) studies utilized the most prolonged period  Table 1. In n = 47 (55.3%) articles, at least one author came from the United States. Further, at least one author in n = 5 articles came from Australia, Brazil, and the Philippines; n = 4 came from Germany, Ireland, Turkey, and the United Kingdom, n = 3 came from China, Italy, Japan, Austria, n = 2 came from Canada, France, India, Malaysia, a n = 1 from Iran, Kuwait, Netherlands, New Zealand, Poland, Portugal, Romania, and Spain. No author came from Africa or Central America. The first cancer-related study utilizing GT was published in 2013, and in the following years, the yearly number of publications rapidly increased and reached n = 20 in both 2020 and 2021 ( Figure 2). A total of n = 23 (27.1%) researches were published in a journal dedicated to oncology, n = 9 (10.6%) in the journal about public health issues, n = 8 (9.4%) in dermatological journals, n = 18 (21.2%) in journal with broad scope (e.g., PloS One, BMJ Open, Cureus), and n = 28 (32.9%) in others types of journals. N = 32 (37.6%) articles were published in open-access journals. Over half of the studies had descriptive characters, mainly characterizing temporal and regional trends (Figure 3). E.g., Zhang et al. described the seasonal interest of Google users in "tobacco" and "lung cancer" in Australia, New Zealand, the United Kingdom, and the United States. Approximately 30% of studies had surveillance characteristics. An example of a surveillance study is Brazilian research assessing Google searches related to breast cancer and mammogram and their association with the number of cases and performed mammographies in Brazilian states. Finally, one of six papers assessed causal interference between specific events and Google searches. For instance, Noar et al. analyzed the effects of public figure announcements of diagnosis and/or death due to pancreatic cancer on Google queries (Table 1). We classified n = 19 (22.4%) studies as related to prophylaxis, n = 17 (20.0%) as awareness event, n = 9 (10.6%) as celebrity-related, n = 11 (12.9%)

Section and Topic
Item # Checklist item Location where item is reported related to COVID-19 and n = 40 (47.1%) as others (Figure 4). An example of studies on prophylaxis is the study of Kamiński et al., who found an association between the burden of colorectal cancer and interest in colonoscopy of Google users in n = 60 countries. Several studies analyzed the association between GT statistics and awareness month, e.g., Pink October represents breast cancer awareness month, and March is an awareness month of colorectal cancer in the United States. Announcements of the death of celebrities due to cancer may increase the number of Google queries on cancer, which was reported after the death of Chadwick Boseman due to colorectal cancer (Table 1). In recent years, many researchers analyzed the effects of the COVID-19 pandemic on the interest of Google users in many malignancies. E.g., Adelhoefer et al. found that interest in many malignancies decreased during the pandemic's first months. The most frequently analyzed cancers were: breast cancer (n = 28), prostate cancer (n = 26), lung cancer (n = 18), colorectal cancers (n = 18), skin cancers (n = 16), and cervical cancer (n = 14) ( Figure 5). We found that n = 17 (20.0%) of studies admitted using GT topics instead of search terms. Overall, n = 24 (28.2%) researches involved GT statistics for all available countries, n = 33 ( Authors of only n = 9 (10.6%) studies performed sensitivity analysis. Most studies compared RSV of analyzed terms to the search terms both medical and nonmedical. Another convincing sensitivity analysis involved comparing results between different countries or between Northern and Southern Hemisphere countries.
20b Present results of all statistical syntheses conducted. If meta-analysis was done, present for each the summary estimate and its precision (e.g. confidence/credible interval) and measures of statistical heterogeneity. If comparing groups, describe the direction of the effect.
A total of n = 23 (27.1%) researches were published in a journal dedicated to oncology, n = 9 (10.6%) in the journal about public health issues, n = 8 (9.4%) in dermatological journals, n = 18 (21.2%) in journal with broad scope (e.g., PloS One, BMJ Open, Cureus), and n = 28 (32.9%) in others types of journals. N = 32 (37.6%) articles were published in open-access journals. Over half of the studies had descriptive characters, mainly characterizing temporal and regional trends ( Figure  3). E.g., Zhang et al. described the seasonal interest of Google users in "tobacco" and "lung cancer" in Australia, New Zealand, the United Kingdom, and the United States. Approximately 30% of studies had surveillance characteristics. An example Section and Topic Item # Checklist item Location where item is reported of a surveillance study is Brazilian research assessing Google searches related to breast cancer and mammogram and their association with the number of cases and performed mammographies in Brazilian states. Finally, one of six papers assessed causal interference between specific events and Google searches. For instance, Noar et al. analyzed the effects of public figure announcements of diagnosis and/or death due to pancreatic cancer on Google queries (Table 1). We classified n = 19 (22.4%) studies as related to prophylaxis, n = 17 (20.0%) as awareness event, n = 9 (10.6%) as celebrity-related, n = 11 (12.9%) related to COVID-19 and n = 40 (47.1%) as others (Figure 4). An example of studies on prophylaxis is the study of Kamiński et al., who found an association between the burden of colorectal cancer and interest in colonoscopy of Google users in n = 60 countries. Several studies analyzed the association between GT statistics and awareness month, e.g., Pink October represents breast cancer awareness month, and March is an awareness month of colorectal cancer in the United States. Announcements of the death of celebrities due to cancer may increase the number of Google queries on cancer, which was reported after the death of Chadwick Boseman due to colorectal cancer (Table 1). In recent years, many researchers analyzed the effects of the COVID-19 pandemic on the interest of Google users in many malignancies. E.g., Adelhoefer et al. found that interest in many malignancies decreased during the pandemic's first months. The most frequently analyzed cancers were: breast cancer (n = 28), prostate cancer (n = 26), lung cancer (n = 18), colorectal cancers (n = 18), skin cancers (n = 16), and cervical cancer (n = 14) ( Figure 5). We found that n = 17 (20.0%) of studies admitted using GT topics instead of search terms. Overall, n = 24 (28.2%) researches involved GT statistics for all available countries, n = 33 (38.8%) only for the United States, and n = 30 (35.3%) for other countries or combinations of different countries (including combination with the United States). A total of n = 40 (47.1%) studies utilized the most prolonged period available in GT since 1st January 2004. We identified results that most [n = 79 (92.9%)] of the studies provided all search input details to reproduce their results. Overall, n = 34 (40%.0) studies confronted GT statistics with real-world data. Almost two third of studies performed correlation analysis [n = 55 (64.7%)] and over half peaks analysis [n = 43 (50.6%)]. The less popular statistical approach included secular trends analysis [n = 9 (10.6%)], seasonal trends analysis [n = 6 (7.1%)], forecasting [n = 4 (4.7%)], and others [n = 24 (28.2%)]. Authors of only n = 9 (10.6%) studies performed sensitivity analysis. Most studies compared RSV of analyzed terms to the search terms both medical and nonmedical. Another convincing sensitivity analysis involved comparing results between different countries or between Northern and Southern Hemisphere countries.

Section and Topic
Item # Checklist item Location where item is reported 20c Present results of all investigations of possible causes of heterogeneity among study results.
We found that n = 17 (20.0%) of studies admitted using GT topics instead of search terms. Overall, n = 24 (28.2%) researches involved GT statistics for all available countries, n = 33 ( Authors of only n = 9 (10.6%) studies performed sensitivity analysis. Most studies compared RSV of analyzed terms to the search terms both medical and nonmedical. Another convincing sensitivity analysis involved comparing results between different countries or between Northern and Southern Hemisphere countries.
Reporting biases 21 Present assessments of risk of bias due to missing results (arising from reporting biases) for each synthesis assessed.
Not applicable. compared RSV of analyzed terms to the search terms both medical and nonmedical. Another convincing sensitivity analysis involved comparing results between different countries or between Northern and Southern Hemisphere countries.

Discussion
23a Provide a general interpretation of the results in the context of other evidence.
Our study found that the number of studies utilizing GT in oncology has constantly been increasing. Most of the included papers come from the Western World, and we did not find any papers from Africa or Central America. Moreover, many of the available studies successfully use data from the Global South, e.g., Africa. Previous studies observed that many developing countries have a low search volume for many popular health topics, such as pain or diets. It could be related to the lower access to the Internet in the Global South. Furthermore, developing countries have far less productive research facilities than the Western World. However, the number of people using the Internet is increasing the fastest in developing countries. Therefore, we may expect more data about Google searches in these regions to be available shortly.
The included articles were published in journals with various scopes. Moreover, approximately one of three papers were published in an open-access journal. It is tempting to hypothesize that articles using GT are usually held as a curiosity and often published in wide-scope journals. In our systematic review, we utilized the classification of GT articles used previously by Nuti et al. Here, we observed that most of the oncological studies utilizing GT had descriptive character, and a total of less than 20% analyzed causal interference and up to 30% surveillance. Nuti et al. reported that 39% of studies used GT for description, 34% for surveillance, and 27% for causal interference. The proportion between both reviews seems similar. However, the review by Nuti et al. included all GT studies on health phenomena from 2009 to 2013, while we had narrower inclusion criteria but included all studies until mid-2022. Nuti et al. defined descriptive studies as "aimed to describe temporal or geographic trends and general relationships, without reference to a hypothesized causal relationship". For this reason, we assume that descriptive studies are generally easier to perform by researchers, which explains their popularity. The analyzed studies vary in terms of study category, which presents many ideas for utilizing GT in oncology-related studies. A large group of papers took the form of prophylaxis studies. They mainly focused on assessing the interest in oncological screenings and a wide range of preventive activities and risk factors of the common neoplasms. Most studies analyzed the interest of Google users in cancer screenings in the USA. These studies concerned, among others, topics related to the prevention of melanoma and various risk factors of skin cancer known from the literature. (…) An interesting direction in using GT was studying the relationship between the Section and Topic Item # Checklist item Location where item is reported getting cancer from a celebrity and the global interest in this particular malignancy. In general, it is observed that the diagnosis, and especially the premature death of a celebrity combined with the announcement of this fact to the media, significantly increases the interest in a given cancer. Examples of such relationships were the increased interest in colorectal cancer and colon cancer screening associated with the death of Chadwick Boseman (actor), the diagnosis and death of Patrick Swayze (actor) and Steve Jobs (Apple co-founder), due to pancreatic cancer or the diagnosis of the prostate cancer in Ben Stiller (actor). In addition, the analysis of the effect of the cancer-related death of a celebrity is not the only example in the scientific literature. Another example is the death of Harold Allen Ramisz (actor) due to complications of an autoimmune inflammatory vasculitis.
Another direction was to use GT to examine whether social campaigns or cancer awareness month are associated with the interest fluctuations of Google users in specific malignancies. Firstly, GT is a widely available, free tool to check the importance of health promotion events. Several papers' data showed a correlation between the cancer awareness campaign and the increased public interest in oncological screening. Such observations have been reported for Pink October and breast screening in Malaysia. Furthermore, several studies have confirmed that events such as Breast Cancer Awareness Month were associated with increased interest in this topic among Google users. Studies confirmed that, compared to other months, there was an increase in searches for topics related to breast cancer and mammography in October in all countries. However, similar campaigns for lung or prostate cancer did not show a similar relationship. Patel et al. observed that cancer awareness campaigns aimed at men are not associated with a significant increase in interest in topics related to screening, risk factors, or cancer itself. Gender may be the potential cause of these observations. Women, as Internet users, are usually described as more likely to search for health topics and to be more aware of Internet use in this area. Interestingly, studies analyzing data from all countries show the lack of intended effectiveness and increased interest in many cancers.On the contrary, studies analyzing national data, such as from Brazil and New Zealand, described an increased interest in such cancers as glaucoma, prostate, and lung. Finally, not only awareness events related to malignancies have been investigated with GT, but also a similar paper on the effect of World Sepsis Day was conducted. Several studies analyzed the interest of Google users in cancers during the COVID-19 pandemic and compared the pre-pandemic period with the first months or years of the pandemic. The analyzed studies showed a decrease in interest in many cancers and their screening programs during the advent of a novel virus. The decreases in analyzed search terms' RSVs were generally consistent with the simultaneous decrease in the number of diagnostic procedures or new cancer diagnoses. These studies were feasible examples of using GT in emerging public health problems. GT was previously used to analyze outbreaks of infectious Section and Topic Item # Checklist item Location where item is reported diseases. Here, we presented a series of GT studies showing how COVID-19 could have diminished the importance of cancer awareness, which may lead to increased excess mortality during the pandemic. Notably, some GT observations were proven by real-world data. Interestingly, the most commonly analyzed cancers are those 1) with screening programs or 2) with the highest prevalence worldwide. The search terms representing rare malignancies could be written by a small population of Google users. For this reason, the trends of rarely typed search terms could be susceptible to irregular fluctuations without specific secular or seasonal patterns. The trends of search terms with low search volume could be problematic in statistical analysis and interpretation of the results. That problem could be prevailed by matching the search terms of a topic in GT search engine to include more significant regions and all queries related to the topic. However, only one in five included GT studies admitted to using topics instead of search terms. Correlation and peaks analyses were the most prevalent among included studies. These methods are widely used and are easy to understand by readers. However, correlation analysis allows only to detection of associations, and many outcomes may represent incidental findings if not confronted with real-world data or strengthened by sensitivity analysis.
Regrettably, a few studies analyze secular or seasonal patterns, and only four performed forecasting. Our findings are similar to the results of Mavragani et al., who also found that correlation analysis was the most prevalent and forecasting analysis the least prevalent among studies utilizing GT for analyzing health phenomena. Forty percent of papers confronted their results with real-world, and only nine reviewed studies applied sensitivity analysis. In our opinion, many GT pieces of research, without being compared with real-world data (e.g. epidemiological data) or without appropriate sensitivity analyses, are only a form of curiosity. The GT data cannot be fundamental to new recommendations or epidemiological/e-health tools. In that form, the data from GT provide only a curiosity, giving, at most, a new perspective on some epidemiological issues. However, GT may provide insight into health phenomena that would require large-scale observations with uncertain clinical significance.
23b Discuss any limitations of the evidence included in the review. Our systematic review concerned retrospective studies on various topics, making it difficult to classify them uniformly. (…)In our review, we did not analyze whether the exact results of the papers could be reproduced or whether the authors drew the appropriate conclusions.
23c Discuss any limitations of the review processes used. For this reason, we performed only descriptive statistics. Furthermore, there are no standards for conducting systematic reviews of studies utilizing GT data. The novelty of the tool makes the researcher rely on their diligence. However, this is one of the first systematic reviews of GT ever, and the methodological practice is just being formed.